Proceedings of the 4th National Big Data Health Science Conference

Objective: Many

This proceeding contains some of the oral and poster presentations from the 4 th National Big Data Health Science Conference (Columbia, SC, Feb 10-11, 2023) that was organized by the University of South Carolina Big Data Health Science Center (BDHSC; https:// bigda ta.sc.edu/).The BDHSC is an interdisciplinary enterprise that promotes and supports Big Data health science research through capacity development, academic training, professional development, community engagement and methodological advancement.Starting in 2020, this recurring annual national conference has four primary goals: 1) Create a multidisciplinary scientific venue for the exchange of new concepts, methods, and results to encourage the sharing of theoretical, methodological, and substantive knowledge from Big Data health science research; 2) Identify new issues that are, to date, understudied in this area, and then generate, promote, and support innovation in Big Data health science; 3) Expand impact and scholarly excellence by producing new and interdisciplinary publications; and 4) Promote inclusive excellence in training and mentoring opportunities by engaging and supporting underrepresented junior investigators and students in the conference.This proceeding reflects our efforts in achieving those goals, especially those related to the scientific knowledge creation and dissemination.The U.S. healthcare industry is a complex adaptive system 1 that is constantly changing as a result of technological advancements, aging populations, changing disease patterns, increasing noncommunicable diseases, rising costs, new discoveries for the treatment of diseases, political reforms and policy initiatives. 2Moreover, big societal outbreaks and events like COVID-19 and climate change put additional and unexpected burdens on healthcare.Novel strategies are needed to bring about much needed change to the complex and evolving U.S. healthcare system.The exponential growth of healthcare data from various sources and the emergence of advanced information, communication and computational technologies, collectively referred to as "Big Data analytics" (or data science), offers an invaluable opportunity to improve the quality and efficiency of healthcare. 3,4The NIH-Wide Strategic Plan for FY 2021-2025 leverages data science as one of its cross-cutting themes. 5Further, NIH's first Strategic Plan for Data Science 6 released in June 2018 suggests that the Big Data approach will advance uniquely our understanding of disease prevention, identification, control and treatment in the coming decades and will be key to reducing national and global health disparities.While other health related Big Data conferences offer a deeper dive into their respective areas such as artificial intelligence, 7 machine learning, 8,9,10 and health information technology, 11 the BDHS Conference differs in its approach by filling an important role in promoting Big Data health science and serving as a multidisciplinary platform by bringing all stakeholders together to focus on addressing healthcare challenges and advancing our understanding of the unique solutions Big Data offers the nation's healthcare system.To the best of our knowledge, there are few conferences in which the sole theme is fully focused on Big Data health science.Through its menu of plenary (keynote) sessions, participatory breakout sessions organized by content of Big Data (electronic health records [EHR] data, geospatial data, social media data, genomic data, and artificial intelligence [AI] for sensing and diagnosis), panel discussions, hands-on workshops, special sessions (e.g., NIH grantee session, NIH R25/T35 trainee session, underrepresented minority luncheon), poster sessions, and networking opportunities, this annual conference moves beyond offering one piece of the puzzle and seeks to bridge the gaps between Big Data health science's many crucial stakeholders including the general public, research communities, government agencies and healthcare providers.The use of Big Data analytics or data science in healthcare has already presented promising results for the generation of value for all healthcare stakeholders in several contexts.However, more technological, organizational, multidisciplinary, and collaborative connectivity among stakeholders from academia, government and industry is critical to realizing the full potential of data science in healthcare.To address these demands and challenges, our conference will continue to provide a multidisciplinary forum for the discussion of state-of-the art advancements in Big Data health science; promote open discussions surrounding critical questions in Big Data health science with particular emphasis on emerging methods which may contribute to the future of healthcare; and facilitate the exchange of ideas and communication of findings that could shape Study Objectives: There is a direct relationship between the biological characteristics and functions of a protein and its three-dimensional conformation, which is influenced by its genetic sequence and changes in environmental conditions.Currently, experimental approaches provide the highest quality, accuracy, and reliability in elucidating protein structures, as compared to computational methods.However, in recent years, computational modeling techniques have become indispensable for studying protein structures.One such computational model is AlphaFold, which is one of the most prominent neural network-based models that predicts protein structure solely based on the peptide sequence of amino acids.Although very successful, AlphaFold has demonstrated certain deficiencies in structure determination of proteins.The objective of this work is to develop data analytics techniques in order to identify and correct the points of structural error by AlphaFold.Methods: AlphaFold results of modeling protein templates that are the targets of critical assessment of structure prediction (CASP) community experiment XIV and XV have been extracted from ("https://predictioncenter.org/casp14/results.cgi") along with the corresponding experimentally determined PDB files.In addition, we have developed PDBMine algorithm, which utilizes PDB ("https:// www.rcsb.org/")data to provide an approach for the validation and modeling of protein structures.We use PDBMine 1 as a method of assessing the plausibility of model structures provided by any modeling technique such as AlphaFold.In this experiment we used target 1024 for our studies.Results: Figure 1 below illustrates dihedral angles obtained from three different sources.The PDBMine results for specific residues (residues 98 and 131) are plotted in blue, and the corresponding dihedral angles from the X-ray and AlphaFold structures are illustrated in green and red respectively.Figure 1 (a) shows an agreement between PDBMine and X-ray Structure, and disagreement between PDBMine and AlphaFold for residue 98 with local structure violations.On the other hand, residue 131 with correct local geometry demonstrate agreement between PDBMine, X-ray Structure, and AlphaFold structures.Discussion: PDBMine can be used as a post-processing technique to improve the performance of AlphaFold.First, it can be used to identify the points of structural divergence and second, to correct any errors in modeled structures by AlphaFold.PDBMine can also help to accept of decline a model produce by AlphaFold in order to increase the reliability of the modeled structures.Study Objectives: Numerous chronic diseases, including atherosclerosis and aneurysms are rooted in abnormal changes within the human vascular system.However, the manual examination of medical images, such as computed tomographic angiograms (CTAs), for analyzing the vascular system is a time-consuming and exhaustive task.To tackle this challenge, we propose a deep learning model specifically designed to segment the vascular system in CTA images of patients who undergo surgery for peripheral arterial disease (PAD).Our research focuses on accurately predicting two regions: (1) from the descending thoracic aorta to the iliac bifurcation, and (2) from the descending thoracic aorta to the knees in CTA images, utilizing advanced deep learning techniques.Methods: Using the dataset of 11 patients collected at Prisma health 1 , we utilized a deep learning algorithm to segment the vascular system.Our model architecture follows an encoder-decoder structure similar to U-net 2 , incorporating skip connections from the encoder to decoder.Additionally, a Transformer 3 is utilized in the bridge between these two main blocks as illustrated in Figure 1.
The inputs to the model were images with dimensions of 512x512 and three channels, and the output was a mask with dimensions of 512x512 and one channel.We trained the model using a batch size of 40, for 400 epochs, with a learning rate of 1e-3.During training and validation, we utilized Intersection over Union (IOU) as the evaluation metric, while Dice score was used for testing.
Results: During the training and validation process, we achieved average IOU accuracies of 97.3% and 95% for segmenting the vascular system from the descending thoracic aorta to the knees, respectively, in the cross-validation.In the testing dataset, we obtained average Dice accuracies of 93.3% and 83.4% for (1) segmenting the vascular system from the descending thoracic aorta to the iliac bifurcation and (2) from the descending thoracic aorta to the knees, respectively.These results demonstrate the high accuracy and potential clinical usefulness of our trained model.Figure 2 showcases the outcomes achieved by the trained model when segmenting the vascular system in the testing data.Discussion: Accurately capturing and examining the vascular system enables the identification of various pathological conditions like aneurysms and vascular calcification.Moving forward, our primary objective is to improve the accuracy of segmenting and precisely measuring calcification within the vascular system.This progress will significantly enhance diagnostic precision, facilitate proactive treatment, and ultimately lead to better outcomes for patients in the field of vascular health.Study Objectives: The rise in usage of electronic health records (EHR) promises the development of predictive algorithms for adverse health outcomes.While much focus has been on the accuracy/discrimination of models, calibration-the agreement between the estimated and true risk of an outcome-is also important in clinical settings.
Calibration is assessed via reliability diagrams and quantified through the miscalibration component of Brier Score decomposition.The methodology relies on binning the predicted probabilities.Historically, the number of bins is selected in an ad hoc fashion.Changing the number alters the appearance of the reliability plot and the values of the metrics in the decomposition.The CORP approach 1 , which generates optimally binned reliability diagrams in an automated way, aims to solve this problem of instability.
Our objective is to assess the effectiveness of CORP and compare the calibration and discrimination of several machine learning methods for predicting three health outcomes of interest: sepsis, respiratory failure, and mortality.Methods: Our training set is a 5% sample of 2018 national Medicare admission data (n=476,593).Patients with 12 months of part A and part B coverage, without any part C coverage, and age>=18 are considered.The test set is a 5% sample of similar data from 2019 (n = 465,041).Features include ICD-10 2 diagnosis codes, CCSR 3 disease codes, CPT-10 4 procedure codes, and the age and sex of patients.The ICD, CCSR and CPT variables are considered with 365 and 90-day history.
The models predict the probability of an adverse outcome within 90 days after admission.We investigate performance of XGBoost, Light-GBM, and regularized logistic regression models.Metrics of interest include the Brier Score and the components of its decomposition: First Term: Miscalibration, Second Term: Discrimination, Third Term: Uncertainty Here, predicted probabilities are discretized into K bins.In bin k, n k is the number of data points, p k is the average predicted probability and o k is the average outcome.o is the average outcome over the popula- tion (incidence).The uncertainty is inherent to the data itself, so is independent of the predictive model.Note that two models can have identical Brier Scores yet different miscalibration.CORP uses the pool-adjacent-violators (PAV) algorithm to optimally select the number of bins and their sizes.To each predicted probability, PAV assigns a calibrated probability under the regularizing constraint of isotonicity and interpolates linearly in between to facilitate comparison with the diagonal corresponding to perfect calibration.
Results: Table 1 shows results for the LightGBM mortality model using the standard fixed bin-width calibration approach.The non-monotone variation of the miscalibration as the number of bins increases illustrates the instability mentioned above.Table 2 shows results of the different models for the various outcomes using the CORP approach.LightGBM consistently yields the best results for all the metrics and shows the best calibration by far.Discussion: We have found CORP to be effective in providing stable reliability diagrams and calibration metrics.It is expected that its adaptation in a clinical setting would be valuable and give a rigorous analysis of model predictions.Results: According to the model interpretability (SHAP), the main predictor of the outcome was age, followed by BMI, diarrhea, hypertension, early stages of kidney disease, diabetes, race, pneumonia, smoking status, and gender in ranking order.Significant differences were found among the mean of the variables mentioned earlier between the two patient groups: (1) expired; (2) survived.
It was also noted that individuals over 65 ('older adults'), including 'males' , 'whites' , 'Alaska Native Americans, and 'current smokers' , were at greater risk of death.On the other hand, BMI, classified as 'overweight' and 'obesity' , was a significant indicator of mortality.The study also reported that regular use of medicines ('ARBs' & 'ACEs') to treat high blood and heart failure could reduce mortality.These findings suggested that the model could learn features from each category, such as 'patients' characteristics, 'pre-hospital comorbidities, and 'medications' but mostly from characterizing 'pre-hospital comorbidities.' Therefore, the model revealed the potential to be effective in measuring 'mortality' while being transparent and reliable.The small scores corresponded to small increases in Root Mean Square Error (RMSE) -evidence of better model performance in both seen (train) and unseen (test) cases in the current study.

Conclusion:
The performance of the study model is consistent with other Machine Learning (ML) tools 1,2 used in various health domains.AI can potentially provide healthcare workers with the ability to stratify patients and streamline optimal care solutions when time is of the essence and resources are limited.This work sets the platform for future work that forecasts patient responses to treatments at various levels of disease severity, identifies patients at high risk of developing long-term complications, and assesses health disparities and patient conditions that promote improved health care in a broader context.The geographic area of interest was limited to block groups within ten neighboring counties in South Carolina.For each block group, the following variables were calculated: number of patients with a given SDoH need (five variables); number of CBOs servicing each SDoH need (five variables); and the mean number of ED, IP, and PCP visits (three variables).The Getis-ord G i * statistic was used to identify block groups that were hotspots for each of these variables (food insecurity example, Figure 1).First, chi-square tests detected whether there was an association between hotspots for a given SDoH need and hotspots for the respective CBOs.Then, chi-square tests were made between all SDoH needs hotspots to each of the health resource use hotspots.Results: For tests between SDoH needs and respective CBOs, significant associations were found for food insecurity and food-based CBOs (p-value=0.035) as well as social isolation and its respective CBOs (p-value=0.048,Table 1).Hotspots for IP visits were associated with SDoH hotspots for financial instability and both measures of social connectedness; however, the associations were not significant after adjusting for multiple comparisons.No significant associations were found between other categories of healthcare resource use (ED, PCP) and SDoH hotspot status.Discussion: Identifying hotspot associations can generate hypotheses about the relationship between SDoH needs, the CBOs working to mitigate such needs, and the impact these community features have on healthcare resource use needs.For example, if there is a lack of CBOs for financial instability, then that might contribute to increased IP visits.Moreover, hotspots of food insecurity overlapping with food-oriented CBOs might indicate that CBOs are situated in the appropriate communities.More research will assist in ascertaining the underlying mechanisms behind the identified associations, and lack thereof.The current study serves as a starting point for further analysis in identifying the relationships between SDoH and CBO hotspots and how community features may impact healthcare service use.Declaration: Exemption for this study was obtained from the Institutional Review Board at the University of South Carolina due to the secondary analysis using de-identifiable administrative data.

Fig. 1 (abstract O6). Comparison of food insecurity hotspots and respective CBO hotspots
Objective: Many women visit multiple providers over the course of their pregnancy, making identifying the predominant provider difficult.Using claims data to identify a predominant prenatal care (PNC) provider is not always straightforward, yet it is essential for assessing access 1 , cost 2 , and outcomes 3 .Previous algorithms applied plurality 4 (providing the most visits) and majority 5 (providing at least half of all visits) criteria to identify the predominant provider in the primary care setting, but they lacked visit sequence information.Except visit frequency, PNC initiation is crucial for PNC quality indices, and the last PNC may involve delivery referral.This study proposes an algorithm that includes PNC sequence information to identify the predominant provider and estimates the percentage of pregnancies with an identifiable predominant PNC provider.Additionally, differences in travel distances to the predominant and nearest provider are compared.

Method:
The dataset used for this study consisted of 108,441 live births and 2,155,076 associated claims from the 2015-2018 South Carolina Medicaid, obtained from the South Carolina Revenue and Fiscal Affairs (RFA) Office.The analysis focused on patients who were continuously enrolled in Medicaid throughout their pregnancy and had at least one PNC visit, resulting in 32,609 pregnancies.PNC visits were identified by comparing delivery date with claim date and refined using claim diagnosis and procedure codes as well as specialty. 6o classify PNC providers, seven subgroups were created based on PNC frequency and sequence information (Table 1).A stepwise algorithm was developed to determine the predominant PNC provider, considering both the frequency of PNC visits (in all scenarios) and the sequence of visits (in scenarios 4, 5, and 6) (Table 1).PNC dispersion information was used as supplementary data, as it is impossible to identify a predominant provider if the number of visits equals the number of providers.The percentage of identified predominant providers was reported.
Chi-square tests were conducted to assess whether the probability of being identified as a predominant provider for a specific subgroup differed from that of the reference group (PNC(M) -providing at least half of all PNC).Paired t-tests were used to examine differences in travel distance.
Results: By applying PNC frequency information, a predominant PNC provider can be identified for 81% of pregnancies.If PNC sequential information is also included, a predominant PNC provider can be identified for 92% of pregnancies.(Table 1) The predominant provider was identified mostly as the order of PNC(U), PNC(E), PNC(M), PNC(MFVI) and PNC(MFVF).(Table 1) Distance was significantly shorter for pregnant women traveling to the nearest visited PNC provider (an average of 5 miles) than to the identified predominant PNC provider.
Discussion: This algorithm provides researchers and policymakers flexibility in identifying the predominant PNC providers.The inclusion of PNC sequential information in the algorithm has increased the proportion of identifiable predominant providers from 81% to 92%, an increase of 11%.Applying this algorithm reveals a longer distance for pregnant women travelling to their predominant PNC provider than to the nearest PNC provider.ensuring long-term allograft survival in vivo.However, viability assessment usually requires dye labeling; tissues are not unusable after the evaluation.Autofluorescence of intracellular fluorescent coenzymes, such as reduced forms of nicotinamide adenine dinucleotide (NADH) or nicotinamide adenine dinucleotide phosphate (NADPH) and oxidized flavoproteins (FPs), have been long used as a label-free means to study metabolic states of cells 1,2 .We previously demonstrated CV could be assessed by a non-labeling method 3 using two-photon excitation autofluorescence (TPAF) and second harmonic generation (SHG) imaging on rat and porcine cartilage samples, where TPAF/SHG images were merged to form RGB color images with red, green, and blue assigned to FPs, NAD(P)H, and collagen signal channels, respectively.We have also developed deep learning based segmentation and classification algorithms for CV measurement 4 .In this presentation, we introduce a new network for CV measurement using the Mask R-CNN architecture 5  To further address the issue of easy access to the data, a user-friendly and intuitive web application interface is being developed using Flask web framework.
Results: The use of typical statistical analyses (linear and logistic regression) has revealed insignificant correlation between the predictors and the cancer type of patient outcome.However, the use of the Decision Trees revealed some interesting relationships that can be used for explainability and reliability of the Machine Learning approaches.
Based on the initial explorations, Decision Trees have demonstrated a performance approaching 80% success in determining the treatment outcomes.Discussion: Our studies provide predictive models that could potentially be used to improve the diagnostic and prognostic power of data collected from patients at presentation.However, the dichotomy of black box AI approaches perform better than explainable approaches, complicating deployment of these techniques in the domain of medicine and healthcare.
using individual-level data.Leveraging mobile phone-based mental health service utilization (MHSU) data, we estimated the average mental health visit in contiguous US census tracts from 2019 (pre-COVID-19-outbreak) to 2021 (after-COVID-19-outbreak). Two outcomes were measured: average MHSU and MHSU per new depression diagnosis (MHSU-to-need ratio [MnR]).We then investigated the tract-level association between immigrant population proportions and MHSU indicators using mixed-effects linear regression models that accounted for spatial lag effects, time effects, propensity, enabling, and need factors.Neighborhoods with higher immigration percentages had lower MHSU but higher MnR.In detail, neighborhoods dominated by Latino immigration were significantly associated with lower MHSU and MnR, particularly in the U.S. West and South.When the immigrant's country of origin is in Asia, a higher immigrant percentage was associated with a lower MHSU but a higher MnR.These relationships also vary over time.All neighborhoods experienced significant decreases in MnR and MHSU in 2020, but immigrant neighborhoods recover the slowest in 2021 when compared to their counterparts.Our findings demonstrate disparities in MHSU by immigrant status, and these disparities are spatiotemporally heterogeneous.Immigrant-dominated neighborhoods with disproportionately low mental health visits can be monitored in real-time using emerging mobile phone-based MHSU data to precisely provide additional resources to address MHSU disparities.

O13 Using Semantic Web Technology to leverage interoperable clinical decision support system rules: a pathway to interoperable patient records
Xia Jing Issues: Interoperability is a well-recognized significant barrier to sharing patients' medical records seamlessly [1][2][3] .Although HL7, the set of international standards for communicating clinical and administrative data, plays a critical role in achieving interoperability in healthcare, not all institutions can afford to be compatible with these standards.
Having interoperable clinical decision support systems (CDSS) rules can provide a significant step toward realizing interoperable patient records, which are much more complicated than just CDSS rules.Project: We propose using CDSS ontology, an enabling technology of the Semantic Web, to leverage interoperable CDSS rules, which ultimately determine the behaviors of CDSS.We use resource-constrained primary care settings as the ideal targets and two existing open-source electronic health records systems (OpenMRS 4 and OpenEMR 5 ) to demonstrate the feasibility.Figure 1 shows the conceptual model of the project.
To construct the CDSS ontology 6 , we used manual and automatic approaches that complement each other 7,8 .The manual approach primarily involved expert input and iterative feedback.The automatic approach included natural language processing techniques and neural network architecture with semi-supervised learning and transfer learning to identify entity candidates for CDSS ontology from publications about CDSS. Figure 2 shows the primary approach to constructing the CDSS ontology.Centers for Disease Control (CDC) vaccination schedules (0-18 years) are used to generate machine-readable CDSS rules using unambiguous concepts and relationships provided by CDSS ontology.We are in the process of translating CDC tabular vaccination schedules into CDSS rules that programmers can use.The ontology construction is underway, and the automatic part has made excellent progress.
We use OpenMRS and OpenEMR as testbeds for the CDSS rules.We create a CDSS module for OpenMRS and an enhanced CDSS module for the current version of OpenEMR, which does not currently have management and maintenance functionalities.We aim to make the modules easier to be reused and shared.Meanwhile, FAIR (findability, accessibility, interoperability, and reusability) principles will also be followed 9 .We are comparing different solutions and specifying detailed requirements for the CDSS modules and the tracking mechanisms.The project will create the following artifacts for sharing among the community: a CDSS ontology, machine-readable CDSS rules on the CDC vaccination schedule (0-18 years), a CDSS module for OpenMRS, an enhanced CDSS module for OpenEMR, and an open course on CDSS regarding its design, development, use, maintenance, and evaluation.The course will be organized in the formats of lectures and hands-on sessions.The project is ongoing, and its progress will be shared.

Lessons learned and their implications:
Community engagement is a critical and challenging component in ontology development, evaluation, and adoption, as well as in the lifecycle of machine-readable CDSS rules 10 .Moreover, balancing the empowerment of end users by providing flexibility and configuration capabilities and making the module easy to use is also essential and challenging.An example from this case study is providing a complete set of CDSS rules for management and maintenance while ensuring the end users are not overwhelmed.The revised CDSS rules can be validated and verified.Issues: The COVID-19 pandemic brought many disruptions to the health system with a marked effect on the HIV care and treatment continuum causing a reduction in clinic visits and in-person client engagement. 1 Lagos -the most populous city in Nigeria and having an estimated 3.4 million people with HIV (PWHIV) 2 -was the worst hit by the COVID-19 pandemic, with 40% of total cases reported in Nigeria. 3 As of September 2021, there were more than 27,000 HIV program-supported sites in the state, this presented a challenge to continuity in HIV treatment and viral suppression.According to the National Communication Commission report, in 2022, there were approximately 154 million mobile internet subscribers in Nigeria, rising to 158 million in April 2023. 4Leveraging smartphone technology as a tool for delivering accurate and up-to-date information about HIV prevention, testing, treatment, and adherence, 5 we developed and deployed the Jolly-95 App -an innovative strategy in engaging HIV clients to mitigate interruptions in treatment as part of differentiated service delivery models.Project: The Jolly-95 App (Figure 1) is a self-service mobile app designed to sustain patient engagement in care and improve access to HIV support services.Patients with mobile phones were encouraged to sign up to gain access to app features including patient treatment information (antiretroviral regimen, current viral load), appointment scheduling, self-care management, in-app chat, and facilitated referrals to support services.Trained client service operatives provide real-time support for in-app requests and messaging, with end-to-end encryption to ensure data security.
Lessons learned: After 33 weeks of pilot implementation (10th February to 29th September 2022), 1,672 (26%) of 6381 clients who visited the clinic were assisted to download the app.All (100%) had logged in at least once to activate the appointment reminder features.There were more females (69%) than males (31%) among the users, with the greatest proportion (37%) of the users between age 40-49 years and the lowest (8%) between ages 10-29 years.The in-app chat was the most used feature (86%).Feedback via in-app chat showed that clients found the app relevant to their care and made remote access to HIV services easier.Of the 1,672 users, 1573 (94%) were still active in treatment six months after downloading the app.Of 1,044 users who had viral load tests done during the review period, 982 (94%) were virally suppressed.Limitations encountered in utilization included privacy concerns expressed among clients, and data cost to clients.
Integrating the Jolly-95 app into the HIV program can enhance patientcentered care and service quality which will result in improved patient outcomes including continuity in treatment.Additional modules to address HIV self-testing integration, and community pharmacy and facility locator services, are in development to enhance care and expand access.1).Following our assessment of model accuracy, we successfully predicted WNND incidence for the year 2022 (Figure 2).Discussion: This study, utilizing Bayesian inference, is one of the first studies to predict human mosquito-borne disease for the continental USA at the county level, and introduces concepts that have application for future studies.This proof-of-concept mathematical, geospatial modeling approach has proven utility for national health agencies seeking to allocate funding and other resources for local vector control agencies tackling WNV and other notifiable arboviral agents.A research infrastructure for using EHR data enables collaborations in regional and national data networks, as well as AI-based research with focus on unstructured data in clinical records.We have used different types of clinical text classifiers in a variety of clinical domains, using both traditional machine learning and deep learning algorithms.This works demonstrates the often-superior performance of deep learning algorithms such as convolutional neural networks and highlights associated challenges related to interpretability of the results.We have demonstrated that deep learning text classifiers are highly effective for e-phenotyping tasks.Their effectiveness with predictive tasks, such as the prediction of suicidal behavior, is competitive when compared with traditional models using structured EHR data, albeit not as effective as in phenotyping applications.We have also demonstrated text mining approaches such as variable importance or word overrepresentation analyses -may yield insight into the characteristics of phenotype-associated keywords in clinical text and possible symptomatology, e.g., the appearance of words such as "smell", "taste", "loss" in a cohort of patients who tested positive for COVID-19.The de-identification of clinical text using established NLP methods does not seem reduce this performance of classifiers.
2] .There is no centralized system to exclusively gather and monitor data KP for HIV surveillance and programming 3 .The Boloka data repository is being developed as a mechanism for KP data storage.Harnessing big heterogeneous data on HIV for KPs is necessary to improve our understanding of HIV among KPs; assist in setting and monitoring programme targets [3][4] .

Objective:
The overall objective of the study is to leverage and collate existing available KP HIV-related data, in South Africa from 2000 onwards Methods: To achieve the stated objective, this study will undergo several stages as reflected in Figure 1.
Progress Update: This study is in the preliminary stages.Ethics approval from the University of Johannesburg Faculty of Health Sciences Research Ethics Committee has been secured and will be renewed yearly as required.A transdisciplinary and multi-institutional study team is in place.A project steering committee has been established.Research assistants, postgraduate students, and post-doctoral fellows who will utilize the data have been recruited, onboarded, and trained.There has been engagement with stakeholders to develop meaningful partnerships to facilitate collaboration and data sharing (stage 1).Data have been harnessed, including: routinely collected HIV programmatic data, published research data, and technical reports (stage 2).The data received was checked for accuracy, relevance, and quality to enable high impact analyses (stage 3).The data have been placed in a staging area prior to being stored in REDCap (stage 4).The University of Johannesburg (UJ), through its Information and Communication Services, secured REDCap license from the REDCap consortium.The license grants UJ permission to utilise the REDCap software, along with access to the consortium's support tools and resources.Currently, ICS is in the process of implementing the REDCap system in compliance with ICS Security standards and adhering to the best practices recommended for REDCap.Currently, ICS is in the process of implementing the REDCap system in compliance with ICS Security standards and adhering to the best practices recommended for REDCap.The Consortium Resources link has been shared to provide access to REDCap Training Materials.Through this platform, KP data will be securely uploaded into a centralised storage area that is managed and protected.Furthermore, authorised users and stakeholders will have the capability to generate customised reports and export the data to applications such as STATA for further dissemination (stage 5).Initial secondary data analyses using analytic methods attuned to the structure of available data, including cross-sectional and longitudinal analyses are being conducted to improve our understanding of HIV among KPs for a targeted response.

O24 Association between patient-provider shared decision-making and use of pain-related complementary and integrative health modalities among adults with chronic noncancer pain, 2010-2017
Yiwen Shih 1 , Peiyin Hung Results: On the testing set, our model achieves an f1-score of 91.1%, recall of 91.9%, and precision of 90.4%.These results are meant to describe a within-animal generalization based on the partitioning scheme described above.Our previous model described in [2], trained and validated using the same scheme described here, performs with an f1-score of 84.5%, recall of 84.3%, and precision of 84.7%.Discussion: Deep learning can automatically learn information-rich features from raw single-channel EEG signals that support downstream sleep stage classification performance.Confounds such as leakage of information across training and testing set result from the concatenate-shuffle-and-partition validation scheme used here; therefore, an in-depth evaluation of the model featuring a cross-validation scheme is still required.Further, the generalizability of sleep staging models across species is yet under-studied.We incorporate local epoch transition dynamics information via a bidirectional LSTM; however, the size of the neighborhood (like the receptive field), and even the use of attention-based models remains to be evaluated.The classifier proposed here shows promise as an accurate, reliable, and generalizable automatic method for sleep staging.
are of African descent in comparison with the entire population.The hypothesis is set to evaluate whether there will be a significant difference in the onset of the disease by race for alleles and genotypes.Method: This relatability concept is determined by using preprocessing on the data to complete statistical analysis such as computing frequency and distribution tables for the racial groups classified in the data.The design of this study is to use biomarker and participant data of all visits for all participants provided from the National Alzheimer's Coordinator Center (NACC) to complete an association-based analysis to determine the relatability of age, race, allele and gene distribution in the reported Alzheimer's data.

Results:
The population studied had a total 172026 records, 86% of which were white while subject data for African Americans detailed 14%.Results found that overall age did not change the distribution of the gene but may have basis to determine if race was a factor.One of the results of our study showed the differences in race and the APOE gene (figure 1).Analysis shows that the interquartile range of values for African Americans were statistically similar to Pacific Islanders but greater than whites, supporting the overall argument of a different number of genes but not the median gene allocation.Another statistical analysis that was ran compared the populations' race data vs the allele distribution.
For each race, we reviewed the number of e4 alleles associated.Data frequencies show African Americans generally have more of their population containing less alleles while Caucasians tend to be more numerous overall (figure 2).Finally, allele and gene data did not reflect the same relationship comparison by race.Within the African American demographic, as the number of alleles increase, ratio of alleles to subject population were different in comparison to Caucasians.Discussion: Alzheimer's Disease (AD) is a neurological disorder that affects memory in the brain.It is one of the causes of dementia and inhibits the body's ability to perform independent living.African Americans are more understudied in this particular disease primarily due to the lack of research in the disease as it exhibits in African Americans.This data and its accompanying results afford the ability to consider utilizing machine learning as a tool for identification of delineating factors.This approach can be used to also import environmental factors for continued investigation.With improvements in this work and others, relationships between APOE associated with onset, APOE distributions amongst races, prediction of causal candidate genes and increased efficiency in early identification of risks and patterns are possible.This will lead to further advancements in remediating the effects this disease has on people of color.2][3][4] The disparities are largely attributed to biologic factors, early polyp initiation and more aggressive progression among Black population.To study racial differences in polyps, a dataset consisting of polyp features of a large cohort was analyzed using traditional regression and showed no racial differences in polyp likelihood and features except for a few attributes on which Whites had more unfavorable status.Because of the large number of polyp features with low frequencies, and the structural challenges of traditional regression to identify features among them in distinguishing two race groups, this study uses machine learning (ML) to address the research question.Identifying specific polyp features with high cancer risk if found disproportionately among Blacks may help accelerate CRC prevention equity by providing detailed polyp status criteria for increased surveillance.Methods: The will evaluate the performance of multiple supervised ML methods to identify the most important polyp features that distinguish two race groups.Data of 29,425 patients who had screening colonoscopy at an endoscopy center in South Carolina, September 2001 to July 2016 were studied with a total of 48,761 polyps.Of 29,425, 14,636 patients (7,672 Blacks, 6964 Whites) with at least one adenoma or one hyperplastic polyp removed were studied using ML to study polyp differences by race.Seven supervised ML models were evaluated: LR, NB, KNN, SVM, RF, XG-Boost, and AdaBoost.The methods producing the highest performance results were selected, and the most important polyp features that best separated the sample into the two race groups were identified, using Python software.
Results: All 3 samples (total, males and females) were randomly split in 80-20 training and testing sets.The testing dataset was used to produce the confusion matrices, showing LR and AdaBoost to be the best performing models (with highest AUROC and accuracy scores).Using these models, the most important polyp features were: 1) Total polyp burden, among top 10 in all 3 samples, and statistically significant among males and females; 2) Presence of hyperplastic polyp in the right colon, among top 10 in full cohort, and statistically significant in traditional regression in all 3 samples; 3) Total polyp burden in the right colon was important in all 3 samples; 4) Hyperplastic polyp burden in the left colon was important in all 3 samples.Direction of association in traditional statistical regression showed that Black population has a more favorable polyp profile.

Conclusion:
Overall, the study showed that racial differences in polyp profile are at best marginal or favor Black population generally.Traditional regression identified marginal differences between the Black and White populations in the total polyp burden.ML largely confirmed these results and extracted additional information on polyp features that are occasionally present but thought to be associated with higher cancer development potential.Findings indicate that removal of all polyps protects against colorectal cancer, and removal is equally effective by race.Results: Out of 13,154 unique individuals testing positive for chronic hepatitis C from January 1 st , 2020, to December 31 st , 2020; 9,519 had a positive lab result with no follow-up testing, 3,391 had repeated positive tests, 392 had a pattern indicative of a cure, and 60 had a pattern indicative of reinfection.It should be noted that the last three categories are not mutually exclusive, a person with a string of '1101' would be flagged with all three outcomes, though this sort of pattern was rare.Discussion: We were able to create a system that automatically processes chronic hepatitis C labs into a cure cascade.This can be used to quickly target patients and providers for follow-up based on their laboratory patterns, as well as output provider and patient phone numbers for easy follow-up, and target high risk groups.In the future we intend to use this system to improve our hepatitis C program.As a caveat, even with highly sensitive tests, it's possible that some individuals flagged as cleared or reinfected are due to false negatives 3 .Additionally, due to our methodology being contingent on lab testing, we were unable to infer the chronic hepatitis C clearance status of people who did not have follow-up tests that year, which was most of our sample.

P5 Disparities of length of hospital stay in fall-related injuries in South Carolina
Nihan Fila (nfila@ email.sc.edu)  4 .

Methods:
The Zero Truncated Negative Binomial (ZTNB) model is an appropriate estimation of the LOS because of over-dispersed count data without having a zero value in the response variable (LOS) 5,6 .
Aging and being female are expected to be accelerating factors of LOS 7 .As seen in the Figure below, in terms of age, those aged 75 and over comprise 28.0% of hospitalized patients due to falls, followed by the 45 -65 age group which makes up 25.3%.The female proportion of hospitalization is 55%.In terms of the admission source, 60.4% of inpatient admissions are through physician referral (PREF), compared to 28.5% for the Emergency Department (ED).The situation is also similar in Europe as "elderly patients make up about 20% of all emergency room visitors in Europe" 8,9 .
Results: The model states that aging is an accelerating factor of LOS when adjusted for other variables.The ZTNB coefficients for the older and middle age (45-65) groups are respectively 2.52 and 3.07 times more likely than young patients in the effect on LOS (Table 1).
On the other hand, the gender of fall inpatients has the opposite impact on LOS than initially predicted and females are 0.92 times less likely than their male counterparts.Hence, it is strongly possible to say that the young male population's behavioral reality might be subject to more fall-related injuries.Furthermore, research on the demographic segmentation of American people between males and females suggests that the gap in the increasing population is in favor of females but getting narrower 10 .As an admissions source group, PREF is 1.325 times more likely to affect LOS than the reference group of ED, even after taking into account the effect of predictors.
From a public health perspective, the interrelationship between the referral system and ED admissions in this model is beneficial for policymakers to improve health services by developing social response policies and improving effectiveness for patients of all ages, with special attention to middle-aged and elderly people.

Conclusion:
This study suggests that as accidental falls are one of the 10 leading causes of death in both S.C. and the U.S., comprehensive health policy covering falls should be differentiated by age groups and the interrelation between referral system and ED admissions and gender.In particular, policies that prevent the decrease in quality of life caused by a disability as a result of falls should be evaluated with environmental and socio-economic factors.

P6
Socio-economic and marital status differences in the uptake of HIV testing in Tanzania: analysis of the 2016-2017 Tanzania HIV impact survey Salome-Joelle Gass 1 , Peiyin Hung 1,2 , Jan Ostermann 1,3,4 Introduction: Universal HIV testing is a key step toward achieving the UNAIDS goal of ending the AIDS epidemic by 2030.3][4][5][6][7] However, evidence is mixed across geographic contexts, making it difficult to develop targeted HIV testing interventions.This study aims to identify the association of marital status and wealth with HIV testing among youth and adults in Tanzania.Methods: This secondary data analysis used data from a nationally representative sample of 38,680 individuals who participated in the 2016-2017 Tanzania HIV Impact Survey.Weighted logistic regression adjusted for clustering was used to model the association of HIV testing with marital status and wealth, accounting for age, gender, education, and urbanicity of residence.Ever having been tested for HIV and having been tested for HIV in the past 12 months were considered primary outcomes.Post hoc marginal effects estimates were used to calculate probabilities of ever and recent HIV testing by wealth quintile and marital status.
Results: In 2016/17, 67.38% of respondents reported having ever been tested for HIV.The highest wealth quintile had 57.3% (Table 1; aOR = 1.573,CI = 1.227, 1.952) higher odds of ever being tested for HIV when compared to the lowest wealth quintile.However, the fourth wealth quintile had the highest odds of ever being tested.Unmarried individuals had the lowest odds of ever being tested (Table 1; aOR = 0.199, 95% CI = 0.173, 0.230).
Results were similar when looking at having been tested for HIV in the last 12 months, although a lower difference in odds between unmarried individuals and those married/living together was observed.Interaction analyses indicated that unmarried individuals at all wealth index quintiles had lower probabilities of having ever or recently tested for HIV (Figure 1).Married/living together and widowed/divorced/separated individuals in higher wealth quintiles had a reduction in the probability of testing for HIV, however this difference was not significant (Figure 1).

Conclusion:
These findings highlight low rates of HIV testing rates among unmarried and less affluent individuals in Tanzania.Previous studies have shown that wealth correlates with higher levels of HIV knowledge, better access to healthcare, and higher rates of HIV testing. 8y contrast, individuals in lower wealth brackets face greater logistical and financial obstacles when seeking healthcare. 8It is possible that repeat testing as a response to higher risk exposure may be the driver of the lower difference in odds of recent HIV testing between unmarried and married individuals. 9Furthermore, a focus of HIV testing programs on the poor, and stigma-related concerns about potential loss of social status, may contribute to lower rates of recent HIV testing among higher wealth indices. 10Future HIV testing strategies in Tanzania should target the needs and preferences of unmarried, less wealthy, and wealthier married or previously married individuals.Objective: In the United States of America 1 in 10 people have been diagnosed with diabetes. 1Prolonged hypoglycemia can lead to severe complications including seizures, coma, and death. 2 Antidiabetic agents like insulin, 2 GLP-1 agonists, 3 DPP-4 inhibitors, 4 sulfonylureas 4 and and SGLT-2 inhibitors 5 have all been known to cause hypoglycemia, but no study has systematically compared hypoglycemia associations between different drug classes.The objective of this study was to evaluate the association between antidiabetic agents and hypoglycemia using the FDA Adverse Event Reporting System.Methods: FAERS reports from January 1, 2004 to December 31, 2021 were included in the study.Reporting odds ratios (RORs) and corresponding 95% confidence intervals (95% CI) for the association between antidiabetic agents and hypoglycemia were calculated.An association was considered to be statistically significant when the lower limit of the 95% CI was greater than 1.0.Results: A total of 14,467,159 reports (including 78,630 hypoglycemia reports) were considered, after inclusion criteria were applied.30 antidiabetic agents were evaluated, and all of them were significantly associated with hypoglycemia.Discussion: The antidiabetic agents with the highest association of hypoglycemia found were pramlintide, insulin, acarbose, repaglinide, chlorpropamide, glyburide, exenatide, glimepiride, tolazamide, and lixisenatide.Knowing which classes of antidiabetic drugs are most prone to cause hypoglycemia will help inform clinicians on the most appropriate treatment plan for their patients.
Study Objective: Emergency Department (ED) visit rates in South Carolina (SC) present a significant variance, with a considerable 34% of residents inhabiting rural areas where medical resources are often insufficient.Our study aims to quantify the association between factors such as chronic disease prevalence, socio-economic indicators, health behaviors, and the rate of ED visits.This data can help identify ED hotspots and inform the efficient allocation of limited resources.Methods: Data on Emergency Room (ER) visits and population numbers for 2019 were extracted at the ZIP Code Tabulation Areas (ZCTA) level from the South Carolina Revenue and Fiscal Affairs Office (RFA) and the American Community Survey (ACS).The Behavioral Risk Factor Surveillance System (BRFSS) supplied prevalence data on chronic diseases (e.g., depression, arthritis, asthma, hypertension, diabetes) and health-related behaviors (e.g., insurance coverage, exercise, binge drinking, smoking, annual physical exams).A multivariable regression model was utilized to quantify the association between ED visit rate and chronic diseases.

Conclusion:
Our study suggests that areas with high HIV or Diabetes prevalence require increased ED resource support.In the effort to reduce ED visit rates-which often reflect residents' health statuscontrolling HIV and diabetes may have a direct impact.However, longterm strategies such as promoting higher education, encouraging employment, and implementing anti-smoking campaigns may also play a significant, albeit more gradual, role in improving public health.

P10 "You Want to Use My Data?!?": how can patient engagement and outreach enhance big data analytics?
Ariana Mitcham 1,2 , Conor O'Boyle 2,3 , Ginny Cartee 2 , Katie Parris 2 , Ann Blair Kennedy 2,3,4 , Nabil Natafgi The PES has various condition-specific and disease-agnostic panels that reflect the rich experiences of individuals with lived health conditions.
The diversity of the PES panels provides a unique perspective to encourage engagement, while effectively and efficiently communicating data on respectful and culturally competent research.Patients and caregivers in the PES are involved in various team-building activities and training sessions on understanding various research terms and methodologies to facilitate their interaction with researchers and clinicians.Those patients/ caregivers are referred to as Patient Experts as they are experts in living with the health condition (e.g.Long COVID or diabetes).Since its inception in 2016, the PES has trained more than 100 patients and partnered with over 350 researchers associated with nearly 50 institutions nationwide.Specifically, the PES has worked with researchers seeking feedback on projects that include social media, EHR extractions, genomics, geospatial components, AI, and health information technology.
Lessons Learned: Higher levels of engagement take increasingly higher resources and patient feedback has typically not been seen as a part of big data research (Figure1-adapted from Manafò et al. 2018).However, the PES has learned best practices for providing feedback to research teams from research question formulation through results dissemination.The PES has recognized the next steps for further incorporating patient feedback into big data projects including advising big data researchers on the importance of patient feedback for these types of projects.The most efficient avenue for PES expansion will be identifying big data researchers who are champions of co-developing projects with patient engagement.option.This said, compliance with recurring annual screening is crucial and can be a struggle for medically underserved and uninsured patients; the specific population for this study.We reviewed data from FIT test results collected between the period of 2017 to 2020.Specifically, we looked at patients with FIT negative results to see how many completed their FIT test the following year (as per recommendations).Subsequent FIT compliance results are as follows: 3 out of 29 patients screened in 2017 returned for their annual FIT screening in 2018, with an annual FIT compliance rate of 10.34% (Figure 1).Similarly, 5 out of 44 patients screened in 2018 returned in 2019, 19 out of 159 patients screened in 2019 returned in 2020, and 34 out of 175 patients returned for their annual FIT screening in 2021.Annual compliance for fiscal year groups 2017, 2018, and 2019 were similar.However, patients that had their inaugural screening in fiscal year 2020 showed a 7.48% increase in compliance with annual FIT as compared to those that started in the previous year.This increase is statically significant with an odds ratio of 1.63, a Z value of 1.87, and a p-value of 0.030.In 2020 the CCPN implemented new process improvement strategies that centered around virtual patient navigation.This virtual process included video conferencing, mailed-in FIT, virtual educational resources, and instructional videos.Additionally, we began sending out reminders stating, "It's time for your annual FIT." The combination of these two factors may have contributed to the observed increase in patient compliance.We will monitor patient compliance to determine if increase in compliance continues.If compliance to annual FIT remains low, we may consider FIT screening to be less than effective in low-income uninsured individuals.Results: For all patients, regardless of zip code, financial resource strain was the most common social determinant of health barrier, followed by food insecurity.Reported financial resource strain among patients residing in 29203 was higher than non-29203 FMC patients.However, food insecurity, housing insecurity, social connectivity, and transportation needs were roughly uniform across the population regardless of zip-code.Discussion: FMC providers can benefit from the knowledge that a significant portion of their patient population may face financial instability and food insecurity, both of which can greatly impact patient and population health.

P13
Tackling healthcare access by simplifying access to actionable data Samantha Renaud 1 , Qian Huang 2 , Songyuan Deng 1 , Samantha Slinkard-Barnum 1 , Kevin J Bennett Discussion: By combining multiple types of data into a single index, these maps are a quick and easy way for individuals and organization to identify areas of greatest need.The publicly available maps can be leveraged to create incentive programs that will drive providers to practice in areas of higher needs.Subsequent hot spot analysis can be conducted to accentuate where placement would result in the greatest impact on access.The results of these Indices also highlight the disparities in healthcare access for rural South Carolinians.Results: Among respondents in the pooled sample population, 44% received an influenza vaccination.Of those vaccinations, 16% occurred among adults living in the American South compared with 29% elsewhere in the United States.Adults in urban counties of the American South were more likely to have received an influenza vaccination; this difference was statistically significant.When controlling for enabling factors, residing in the American South became insignificant (p-value=0.67),while residing in a rural county remained significant (p-value=0.00).Having health insurance only slightly decreased the odds of influenza vaccination, yet those who reported having a usual source of care were three times as likely to receive an influenza vaccination.When estimating the full model, accounting for geographic and sociopolitical differences, adults residing in rural counties were the least likely to receive an influenza vaccination and those with a usual source of care were two times as likely to receive an influenza vaccination.Individuals with one of the top five chronic health conditions, older adults, and women were also more likely to have received an influenza vaccination.Discussion: Having a usual source of care increased the odds of receiving an influenza vaccination, as most influenza vaccinations were received at a doctor's office followed by a supermarket or drugstore.Additionally, older adults and individuals with chronic health conditions are more likely to visit these places.As the American South continues to grapple with the effects of climate change, rural areas will be most impacted.While influenza remains a top cause of mortality in the United States, health policy related to vaccination needs to consider geographic differences in access to healthcare.Recent literature has demonstrated the ability of deep learning to predict adverse outcomes such as mental health 3 , chronic health conditions 4 , and maternal morbidity 5 , but these studies do not have the wealth and depth of data to justify using deep learning.Multitask learning (MTL) 6 is a form of deep learning that leverages related targets within a dataset to build a single model that has a shared knowledge base 7 .These shared insights serve to stabilize and regularize the model, as well as reduce overfitting especially in the context of electronic health records. 8In this study we obtained a large and rich maternal health dataset to train a MTL model and to demonstrate the effectiveness of certain improvements to the MTL model that we are developing.

P15
Method: Our provided dataset had 271,233 maternal delivery records with 95 features collected from 2015 to 2021.We further augmented the data with Social Determinants of Health and Social Vulnerability Index statistics from the CDC based on maternal location.We trained two MTL models: one for three morbidity related tasks and one for six long-term chronic tasks.Morbidity tasks were severe maternal morbidity (SMM), hemorrhage, and eclampsia; chronic tasks were cardiovascular disease, hypertension, diabetes, obesity, mental, and substance use.Since most of these are rare conditions, we had to mitigate class imbalance via random undersampling and model bias initializations.We created a new method for balancing the training of targets called Task-Adaptive Loss (TAL) gradient strategy.We evaluated each task with F-score and AUC-ROC.
Results: Performance-per-task of the MTL models is greater than singletask DNN and ML models on most tasks, except for two targets (Table 1).The Substance Abuse task had comparatively low performance across the board due to a lack of features that related to it.The hemorrhage classification performance is similar between all models (within 2% of each other) suggesting that the underlying prediction task shares few insights or similarities with the other two tasks in the morbidity group.

Conclusion:
The model demonstrates the possibility of creating a tool pulling in data from several different domains to help identify the characteristics of obstetric patients of highest risk for poor maternal outcomes.In addition, we will be further refining our improvements (TAL) to apply to multitask learning with particular relevance to health informatics.In doing so, our study results should be informative to SC public health policy makers and to clinicians seeking to assess patient risk.tering" using normally distributed data, where a cluster represents some part of a study area in which the mean value is higher than the rest of the study area.The Bayesian spatial scan statistic is designed to detect clustering in continuous valued data that has been collected at different spatial locations.We implement a hypothesis test for clustering using the Bayes factor, in which the alternative hypothesis indicates a cluster of observations for which the means are different from the rest of the data.In order to apply our method, we first identify the most likely cluster as the potential cluster for which the likelihood under the alternative hypothesis is maximized.We conduct a simulation study to evaluate the performance of our method under varying sample sizes, cluster sizes, and observation means.Simulation results consist of the empirical type I error rate for data simulated under the null hypothesis, the empirical power for data simulated under the alternative hypothesis, and the average sensitivity and positive predictive value (PPV) of the test.We observe that the performance of the method gets better as the clustering gets stronger and the sample size increases (the rejection rate increases).Furthermore, the performance of the method improves when we have a large cluster versus a small cluster.The motivation behind a Bayesian approach includes the ability to incorporate prior information when available and directly calculate posterior probabilities.Comparing our Bayesian spatial scan statistic to the frequentist spatial scan statistic, we observe that the Bayesian statistic does not seem to have an advantage here.If we had historical data, we may have been able to set informative prior distributions which could give more power.We may consider this for future work.Some possible ideas for future work include looking at different priors as well as speeding up the computation time.

P18
The dose-response associations between physical activity and cognitive function in older Americans in different demographic subgroups Fanli Yi 1 , Carlos Avalos 2 , Chelsea Richard 1 , Chih-Hsiang Yang

Fig. 1 (
Fig. 1 (abstract O1).PDBMine results for a) residue 98 and b) residue 131 as examples of incorrect and correct local structures respectively

Fig. 1 (Fig. 2 (
Fig. 1 (abstract O3).TransONet structure.In the decoder, ResNet-34 is used; in the bridge, a transformer block is used and feeds into the decoder.Skip connection from different stages of encoder samples to decoder to construct the mask Objectives:The project aimed to identify block group locations in South Carolina exhibiting high levels (hotspots) of social determinants of health (SDoH) needs, community-based organizations (CBOs) and healthcare resource use.The study hypothesized that areas with SDoH hotspots would be associated with resource use hotspots, defined by emergency department (ED), primary care physician visits (PCP), and in-patient (IP) hospital care.Furthermore, the study investigated whether there are overlaps in hotspots for a given SDoH need and CBOs designed to aid that need.Methods: The study sample included Prisma Health patients, aged 18+ years, engaged in ambulatory care and condition management, in-patient case management, or community health in South Carolina's central and northwestern regions.Data was collected for June 1, 2019-December 31, 2020.Information on patients' SDoH needs and their respective CBOs were taken from the NowPow referral system.The five SDoH needs categories included food insecurity, housing instability/quality, lack of transportation, financial instability, and two indicators of social connectedness.EMR supplied patients' ED, PCP, and IP data.Both patient and CBO addresses were geocoded and each was linked to a U.S. Census block group.

Fig. 1 (
Fig. 1 (abstract P11).Subsequent FIT compliance by Fiscal Year P12 Exploring the social determinants of health in 29203 Catherine O'Leary 1 , Mark E. Humphrey 2 1 School of Medicine Greenville, University of South Carolina, Greenville, SC, USA; 2 School of Medicine Columbia, University of South Carolina, Columbia, SC, USA Correspondence: Catherine O'Leary (csole ary@ email.sc.edu) BMC Proceedings 2023, 17(Suppl 19):P12 Introduction: Efforts to improve health equity should include informing healthcare providers about the specific needs of the unique patient population which they serve.Improving the understanding of the social determinants of health (SDOH) of the specific community that physicians are serving allows for targeted interventions based on population needs and prioritization of community partnerships.Methods: In this study, responses to five specific SDOH screening questions were extracted from Epic for the 8206 patients who visited Prisma Health Family Medicine Center (FMC) at Colonial Drive from March 2021 through May 2022.These responses were examined to determine the most common barriers to health faced by the FMC patient population.The responses from patients residing in the 29203-zip code were compared to that of the rest of the FMC patient population for appreciable differences in SDOH.Results: For all patients, regardless of zip code, financial resource strain was the most common social determinant of health barrier, followed by Multitask learning for South Carolina's prenatal maternal care Edward Tsien 1 , Dezhi Wu 1 , Ana Lòpez-De Fede 2 1 College of Engineering and Computing, University of South Carolina, Columbia, SC, USA; 2 The Institute of Families in Society, University of South Carolina, Columbia, SC, USA Correspondence: Dezhi Wu (dezhi wu@ cec.sc.edu) BMC Proceedings 2023, 17(Suppl 19):P15 Study Objectives: Many states in the South rank poorly for preterm birth, low birthweight, and severe maternal morbidity.In South Carolina, the Department of Health and Human Services (SC DHHS) has been making great efforts to implement innovative approaches to reduce maternal morbidity and improve prenatal outcomes with targeted interventions 1 .However, these interventions require accurate criteria for determining which potential mothers are at risk 2 .

(Suppl 19):O5
Methods: We conducted a retrospective analysis of 5,371 COVID-19 disease patients hospitalized for COVID-19-related symptoms from South Florida Memorial Health Systems between March 14 th , 2020, and January 16 th , 2021.Demographics, patient characteristics, and pre-existing health data in the dataset were collected at admission.We trained Random Forest Classifier to predict 'mortality' for hospitalized patients who were infected with the SARS-CoV-2 virus.Our respective Institutional Review Board (IRB) approved the study with the exemption of informed consent and HIPAA waiver.IRB also determined that this project is exempt from further review.

O11 Application of machine learning in predicting breast cancer patient outcome
Ali Firooz 1 , Savannah M. Noblitt 1 , Julie Martin 2 , W. Jeffery Edenfield 2 , Anna Blenda 2,3 , Homayoun Valafar1  1Department of Computer Science and Engineering, College of Engineering and Computing, University of South Carolina, Columbia, SC, USA; 2 Prisma Health Cancer Institute, Greenville, SC, USA; 3 School of Medicine Greenville, University of South Carolina, Greenville, SC, USA of early disease markers and prediction of patient responses to targeted therapies.The multifactorial nature of cancer, influenced by patient health, co-morbidities, environment, and molecular factors, requires the compilation and presentation of data in an accessible manner for tailored treatments.Unifying and standardizing data is a critical step in utilizing machine learning tools to unravel complex relationships in cancer and enable personalized care1.The primary objective of this work is to develop a comprehensive and userfriendly repository of cancer patient data.Methods: A Relational Database Management System has been designed and has been populated by the first round of clinical data from the Prisma Health Cancer Institute Biorepository of ~6,000 cancer patients with at least 66 different cancer diagnoses 2 .Molecular data is available for gene mutations, serum galectin proteins, and glycomic profiles of cancer patients.Mutation status of 50 cancer-critical genes in 1,500 patients, 320 individual patient profiles of 5 serum galectin proteins, and serum and biopsy glycomic profiles of 60 patients have been included and will be expanded.In addition, healthy control values for galectin and glycomic profiles were obtained and added for reference.
accuracy, sensitivity, specificity, PPV and NPV are 94.2%, 95.2%, 93.2%,93.93%,and 95.15% respectively; for 40 iterations, the mean accuracy, sensitivity, specificity, PPV and NPV are 96.57%,96.57%, 96.57%, 96.77%, and 96.75% respectively; for 50 iterations, the mean accuracy, sensitivity, specificity, PPV and NPV using AD are 97.35%,98.24%, 96.47%, 96.74%, and 98.29% respectively.Discussion: The results obtained through gold-standard DNN provide a baseline for more advanced learning approaches in predicting surgical outcome with dMRI measures.Correspondence: Homayoun Valafar (homay oun@ cse.sc.edu) BMC Proceedings 2023, 17(Suppl 19):O11 Study Objectives: Breast cancer is a significant health concern in the United States, ranking as the second leading cause of death.The complex nature of cancer, with its diverse subtypes and heterogeneity, makes accurate diagnosis and treatment challenging.Current medical practices often fail to integrate molecular diagnostics with clinical data, hindering the identification WNND) have been diagnosed, cementing WNV as public health priority2.Given its recent emergence in the United States, high-risk ecologies are largely underdefined, making targeted public health interventions challenging.Therefore, we developed a model to predict county-level WNND human cases in the contiguous USA.Methods: Using the Centers for Disease Control and Prevention ArboNET WNND data from 2000 -2021, we predicted WNND human cases using a Bayesian spatiotemporal negative binomial regression model.The model includes environmental, climatic, and demographic factors, as well as host species distribution.An integrated nested LaPlace approximation (INLA) approach was used to fit our model3.We fit the model by removing variables that were not found to be statistically important individually until all variables were statistically important.We then added the removed variables back in individually, calculated the mean and median square prediction error, and kept those that improved the mean square prediction error.To assess model prediction accuracy, annual counts were withheld, forecasted, and compared to observed values.The validated models were then fit to the entire dataset for 2022 predictions.Results: After model selection, our final model was able to predict 2021 cases with a median square prediction error of 0.006 cases 2 .After variable selection, the model showed accurate prediction of historical WNND cases in most counties, though the model can be improved on counties with very large populations (Figure 1. Subsequently, more than 25,000 cases of West Nile Neuro-invasive Disease ( To engage the local Historically Black Colleges and Universities (HBCU) community and underrepresented minority (URM) students at the University of South Carolina and other South Carolina institutions, and to increase the participation of URM students in the data science applied to health arena, the 2023 National Big Data Health Science Conference planning committee invited faculty member advisors and at least 5 URM students from USC, Allen University, Benedict College, Claflin University, Voorhees University and other institutions to attend the 2023 National Big Data Health Science Conference and participate in a URM Academic Career Development and Data Science Applied to Health Luncheon.This inaugural event was held in conjunction with the 4 th annual National Big Data Health Science Conference on Friday, February 10 th , 2023.It was an opportunity for URM students interested in pursuing academic-related careers to gather with URM faculty and leadership and discuss opportunities in data science, the institutional value of diversity and URM involvement in academia, network and connect with potential mentors, and learn about resources and gain insight into the challenges and opportunities URM traditionally encounter when embarking on academic-related career paths.Attendees received complementary registration to attend the national conference; access to 100% of the conference programming, including presentations by esteemed experts including academic, industry and government leaders and focused breakout sessions and workshops; dozens of networking opportunities; and a seat at the URM Academic Career Development and Data Science Applied to Health Luncheon.Discussion: The inaugural event attracted 45 attendees coming from institutions across South Carolina.The majority of attendees were undergraduate students with biomedical or health science related majors.Also in attendance were faculty representatives and advisors from the attending HBCUs including Benedict College, Claflin University, Allen University and Voorhees University.Leadership from USC were also in attendance.The hour-long working luncheon included presentations from USC leadership and a guided Q&A discussion with URM faculty.This event will be featured again at the 5 th Annual National Big Data Health Science Conference on February 2-3, 2024.text-basedapproaches.Working with these models is far less daunting than it used to be, thanks to machine learning frameworks-such as Google's open-source framework, TensorFlow.
1, 2, Fengrui Jing 1, 2 , Shan Qiao 2 , XiaomingLi 2Background: Nationwide, racial and ethnic minority students comprise approximately 39% of the college population but earn approximately only 17% of bachelor's degrees and 13% of doctoral degrees in the life sciences.1Thepercentageofundergraduate public health degrees conferred to racial/ethnic minority groups decreased from 23% to 18% from 2003 to 2012.2A promising approach to increasing the diversity of the big data analytics workforce in infectious disease research is to

Leveraging the continuity in treatment dashboard analytics to retain persons living with HIV on ART care and treatment in Nigeria: the Lagos ART surge experience
According to the continuum of care demands, PWHIV are identified, started, and retained on treatment using the UNAIDS 95:95:95 treatment goal.2InApril2019, through the U.S. President's Emergency Plan for AIDS Relief (PEPFAR), the Centre for Disease Control and Prevention (CDC) launched an 18-month ART Surge program in nine Nigerian states including Lagos to rapidly increase the number of PWHIV receiving ART.With HIV prevalence in Lagos at 1.3% and an estimated 120,000 PLHIV, half of which were not on treatment,3poor retention has been a concern for the surge program in Lagos.Centre for Integrated Health Programs with funding from PEPFAR/CDC developed an innovative Continuity in Treatment (CIT) retention dashboard to monitor near real-time tracking of all PLHIVs on treatment and prevent drop offs, increasing viral suppression rate, which minimizes HIV transmission, and thereby moving Nigeria closer to epidemic control.Prior to the implementation of CIT-Retention dashboard (at the end of the second quarter in 2020), IIT rate remained high at 5.1% with viral load (VL) coverage and viral suppression at 80% and 95%, respectively.Post-CIT-Retention dashboard implementation, there was a sharp decline of IIT rate to 2.6% (49% reduction), followed by a continuous and sustained decline in IIT rates to 1.1% by June 2022 with improved VL coverage and viral suppression of 92% and 96%, respectively (see figures 1 and 2).The CIT dashboard was useful for near real-time review and automated analysis of patient level data for tracking and monitoring of outcomes (e.g., VL eligibility, sample collection, missed appointment and IIT), and improved program decision making.Conclusion:The CIT-Retention Dashboard can serve as a critical tool to identify and follow up persons in care that are likely to experience IIT using predictive analysis from historical data.This has the potential to significantly improve retention among PLHIV in care, and consequently improve clinical outcomes.

O22 Harnessing big heterogeneous data to evaluate the potential impact of HIV responses among key populations in generalized epidemic settings in Sub Saharan Africa: the Boloka Data Repository
Refilwe Nancy Phaswana-Mafuya 1,2 , Edith Phalane 1,2 , Katharine S. Journeay 3 , Haley I. Sisel 3 , Claris Siyamayambo 1,2 , Betty Sebati 1,2 , Francois Wolmarans 3 , Katherine Rucinski 4 , Amrita Rao 4 , Kalai Willis 4 , Xiaoming Li 5 , Bankole Olatosi 5 , Stefan D. Baral 4 1 South African Medical Research Council/University of Johannesburg (SAMRC/UJ) -Pan African Centre for Epidemics Research (PACER) Extramural Unit; 2 Faculty of Health Sciences, Department of Environmental Health, University of Johannesburg, Johannesburg, South Africa; 3 University of Johannesburg Technology Architecture & Planning, Johannesburg, South Africa; 4 Key Populations Program, Center for Public Health and Human Rights, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD; 5 Big Data Health Science Center, University of South Carolina, Columbia, SC, USA Correspondence: Refilwe Nancy Phaswana-Mafuya (refil wep@ uj.ac.za) BMC Proceedings 2023, 17(Suppl 19):O22 "The work reported herein was made possible through funding by the South African Medical Research Council through its Division of Research Capacity Development under the Mid-Career Scientist Programme using funding received from the South African National Treasury.The content hereof is the sole responsibility of the authors and do not necessarily represent the official views of the SAMRC" Background: Key Populations (KPs), including gay men and other men who have sex with men, female sex workers, transgender persons, people who use drugs, and incarcerated persons, bear a much higher burden of HIV compared to other adults of reproductive age

:
Cognitive impairment has multiple risk factors spanning several domains, but few studies have evaluated risk factor clusters.We aimed to identify naturally occurring clusters of risk factors of poor cognition among middle-aged and older adults and evaluate associations between measures of cognition and these risk factor clusters.

Table 1 )
. The area under the receiver operating characteristics curve for the predictive model was 0.74 and 0.77 for crude model and model adjusted for age, sex, and race.
Conclusion:The model based on selected risk factors may be used to identify high risk individuals with cognitive impairment.Key words: Cognition; risk factors; prediction; cluster; machine learning

NHANES 2011-2014 (validation dataset)
* The tertiles was based on the distribution of all participants † Poor cognition performance was defined as the lowest 10% of the distribution of the MMSE score (NHANES III) or a composite score based on the Consortium to Establish a Registry for Alzheimer's Disease (CERAD), the Animal Fluency (AF), and the Digit Symbol Substitution Test (DSST) tests (NHANES 2011-2014) ‡ Model 1 is the crude model without any adjustments § Model 2 adjusted for age, sex, and race is difficult due to a variety of factors.Some factors include difficulty in recruiting, scheduling, and managing volunteers to facilitate community service projects.By utilizing a volunteer management software, medical students can sign up, track, and increase awareness of inperson and virtual volunteer opportunities.Campus Connect by Galaxy Digital is a volunteer management software used by the University of South Carolina School of Medicine Greenville (USC SOMG) to increase medical student volunteer engagement and track our students' involvement.The school implemented Campus Connect in 2019 and has continued to use it to track students' hours and promote in-person and virtual volunteering opportunities that are posted by the medical school and local community organizations.Students can log hours under 5 sectors: Community Health Improvement Partnerships, Community Medical Services (Free Clinics), School-Based and Educational Initiatives, Social Determinants of Health Initiatives (Food Security, Housing, Economic Stability), and Environmental Initiatives.Furthermore, the logged volunteer hours are all verified by respective organization leadership members.To date, students have logged 8,300.43hours.Of these hours, 1921.44 were School-Based and Educational Initiatives, 1635.15 were Community Health Improvement Partnerships, 885.50 were Community Medical Services, 468.95 were Social Determinants of Health Initiatives, and 67.00 were Environmental Initiatives.The remaining hours were assigned a sixth category, "Individual", which includes opportunities that were not previously entered or offered.The use of a volunteer management software allows community engagement to be tracked systematically to assess community engagement in medical students.Furthermore, it allows volunteer opportunities to be focused on opportunity type (in-person vs. virtual), opportunity focus (Community Health Improvement Partnerships vs. Community Medical Services), and opportunity day and time based on when community engagement is the highest.One limitation associated with tracking volunteer hours is that students may participate in volunteer opportunities hosted by organizations not associated with the institution or local organizations and choose to not log the hours on the volunteer management software.In the future, additional emphasis can be provided to encourage medical students to log hours on Campus Connect.The Campus Connect platform will serve as a database and provide the foundation to any future research analyzing the links between medical students at UofSC SOMG and the impacts of community engagement.The use of Campus Connect has gradually become implemented into the UofSC SOMG medical student experience, and over time, the volunteer management software will be able to provide the data necessary to map out the impact of community outreach on future healthcare providers.Infectious disease electronic surveillance systems are generally designed to focus on the management of acute disease cases and outbreaks.These diseases are either curable or self-limiting and long-term follow-up and case management is not necessary.Chronic hepatitis C infections can persist for years 1 , however, and the traditional surveillance systems are not always built to allow ascertainment of the proportion of hepatitis C infections that may have achieved a cure.In 2021 the CDC published Laboratory-based Hepatitis C Virus Clearance Cascade Program Guidance For Local and State Health Departments, which was intended to provide guidelines for classifying chronic hepatitis C events based on laboratory records 2 .We began to design a program, using the CDC's guidelines, to quickly infer a patient's current hepatitis C clearance status.Methods: We first pulled all chronic hepatitis C laboratory tests from the 2020 calendar year using SQL.We excluded AST, ALT, and bilirubin tests since they are not specific to chronic hepatitis C 2 .These records were sorted by a unique individual identifier as well as the laboratory specimen collection date.Using SAS 9.4, we deduplicated and categorized these records.Labs with simple text responses indicating positive or negatives were categorized first.Numeric tests were considered positive based on the test parameters, or by standard thresholds where test parameters were not stated.Lastly, genotype tests with a response other than unsatisfactory or insufficient were considered positive.The laboratory records were assigned a binary number based on these results, positive (1) or negative (0).The labs were converted into a binary string for each person.The strings were assessed for patterns potentially indicative of a lack of follow-up (1 with no other tests), clear- 2, Alyssa Guo 2 , Jennifer Grier 1 , Lauren Fowler 2 , Jennifer Springhart 1 1 School of Medicine Greenville, University of South Carolina, Greenville, SC, USA; 2 Wake Forest University School of Medicine, Winston-Salem, NC, USA Correspondence: Jennifer Springhart (jenni fer.sprin ghart@ prism aheal th.org) BMC Proceedings 2023, 17(Suppl 19):P2 Abstract There has been little evidence-based research in the literature examining the relationship among medical students, community outreach, and volunteerism.Get Connected, also called Campus Connect, is an innovative software that provides organizations, such as medical schools, with the tools to track and record individual volunteer hours.Community engagement in medical schools has been shown to strengthen leadership, promote empathy, cultivate civic and social responsibility, and improve medical school performance.Promoting community engagement in medical schools Introduction: ance (1,0), repeated positive testing (1,1), and reinfections (1,0,1).

17(Suppl 19):P5 ORCID #: 0000 0003 4278 1290 Introduction:
This study aims to analyze the relationship between Length of Stay (LOS) in hospitals and patients' age, gender, race, and admission source due to falls and fall-related injuries in South Carolina (S.C.) in 2007 -2015.Increasing the life expectancy based on socioeconomic improvements has raised healthcare expenditures where the fall patients' healthcare became a critical issue on family and societal levels1,2,3.Since the accidental falls are one of the 10 leading causes of death in S.C. and the U.S., the healthcare policy encompassing falls must be comprehensive for all age groups and should implement improvements in healthcare services

:
Across 143 ZCTAs, the average ED visit rate is 453 per 1000 people.The lowest ED visit rate is in Fort Mill (ZCTA 29707) at 16 per 1000, while the highest is in Walterboro (ZCTA 29488) at 928 per 1000.
Addressing healthcare access issues is a complex problem that requires buy-in from individuals and organizations with varying understanding of technical data.This project was designed to combine relevant sources of information into easy-to-understand indexed scores to tackle high priorities healthcare access issues in South Carolina.The development of placement indices for providers of primary (PCP) and obstetric/gynecological (OB) healthcare are presented.Method: Data, including provider license information, inpatient (IP) and emergency department (ED) information, and healthcare facilities were sourced from various South Carolina based agencies.Additional population information, such as population estimates, number of women aged 15-50, number of births, were collected from the American Community Survey.The PCP placement score was estimated using provider density, facility density, IP and ED visit rates, and travel distance.The OBGYN placement score was estimated using provider density, facility density, percent of the population that are women of childbearing age, birth rate, and travel distance.Scores were calculated at the state, county, ZCTA, and census track then standardized using mean and standard deviation and ranged from 0-100 (lowest to highest needs).The final index was weighed by the rural population percentage.Results: Index scores were mapped at the ZCTA level and uploaded to the SC Rural Healthcare Resource Dashboard for public access (see figure 1).Results indicate overlap of need for both types of providers.For example, Mountain Rest area (29664) in the northwest border of our state had a high PCP (30.85) and OB (69.52) score.Other areas, such as Trenton (29847), which is northeast of Augusta, has a high OB (60.55) score but a lower PCP (27.29) score.Meanwhile, areas such as Columbia and Greenville have low needs for both types of providers.

based mining of biomedical literature: applications for the drug repurposing
Aliaksandra Sikirzhytskaya 1 , Ilya Tiagin 2 , Joe Magagnoli 1 , Tammy Cummings 1 , Michael Wyatt 1 , Scott Sutton 1 , Ilya Safro 2 , Michael Shtutman 1 Differences among the groups were evaluated using t-tests, chi-square tests and adjusted Cox proportional hazards model.Results: Records from 46563 patients were included in the study (16222 BBB ACEi cohort and 30341 nonBBB ACEi cohort).Most patients were black males.The average age was 49 and 45 (p-value<0.05) while the Scan statistics are used to detect spatial clustering.While they were initially developed to detect regions with an excess of binomial or Poisson events, spatial scan statistics have been extended to detect hotspots in other types of data including continuous data.Spatial scan statistics have also been extended to the Bayesian paradigm for a limited number of data types, including zero-inflated count data and multivariate count data.The Bayesian spatial scan statistic has not been developed for continuous data.Thus, in this work, we develop a Bayesian spatial scan statistic for detecting "areas of clus- 1University of South Carolina, Columbia, SC, USA; 2 University of Delaware, Newark, DE, USA Correspondenec: Michael Shtutman (shtut manm@ cop.sc.edu) BMC Proceedings 2023, 17(Suppl 19):P16 Methods: The study evaluated HIV and angiotensin-converting enzyme inhibitors (ACEi) categorized by their ability to cross the blood brain barrier (BBB).Electronic medical records from Oct 1999 through June 2022 were included and evaluated demographic, comorbid, clinical and mortality data.We investigated if there was an immediate risk of the study outcome of dementia while censoring at 1 and 5 years.

Table 1 (abstract P18). The associations of cognitive function and physical activity in subgroups
A positive relationship between physical activity (PA) and cognition has been found[1][2][3].However, it is less clear whether there is a dose-response association between the effect of PA and cognitive decline or how this association varies among the different demographic groups.This study utilized the device-based physical activity (PA) measures and the performance-based cognitive measures from NHANES 2011-2014 to investigate the dose-response association between PA and cognition among US older adults ≥ 60 years.Methods: The 2011-2014 NHANES survey comprised 2,547 older adults 60 years old or above.The duration and intensity of PA were self-reported and collected using ActiGraphs.The cognitive tests include the Consortium to Establish a Registry for Alzheimer's Disease Word List Memory Task, Digit Symbol Substitution Test (DSST), and Animal Fluency Test[4][5][6][7].This study used the average daily PA and average peak 30-minute Monitor Independent Movement Summary (MIMS) to predict the test-specific and combined global Z scores for cognition.The peak 30-min MIMS is the average for the mean of peak 30 minutes that had the highest MIMS on each valid wear day.Sample weight-adjusted multivariable linear regression was used to evaluate the association in cognitive function by different PA levels.Quantile regressions were used to compare the associations between PA and cognition in various subgroups.